Add missing INLIN(ABL)E in the parallel-merge code #21

ulysses4ever · 2025-03-26T18:04:07Z

Depends-on: #18

If manually remove toLinear, these are numbers I'm seeing on 8 physical CPUs:

❯ cabal run benchrunner -- 10 "Mergesort" Seq 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.206165608

❯ cabal run benchrunner -- 10 "Mergesort" Par 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.256841832

❯ cabal run benchrunner -- 10 "Mergesort" Par 1000000 +RTS -N8 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.189847072

ulysses4ever · 2025-03-26T18:18:36Z

It looks like toLinear doesn't make much difference: single-digit percents. And it hurts the sequential version a little more than the parallel.

ulysses4ever · 2025-04-01T19:40:00Z

I rebased on main but performance broke, sadly:

❯ cabal run benchrunner -- 10 "Mergesort" Seq 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 0.213269371

❯ cabal run benchrunner -- 10 "Mergesort" Par 1000000 2>/dev/null | grep SELFTIMED
SELFTIMED: 1.150078171

(compare to the numbers in OP).

I'll have to get back to it...

ulysses4ever · 2025-04-01T21:46:08Z

The reason is I had a silly rebase mistake. I'll fix it today.

ulysses4ever · 2025-04-02T02:14:11Z

I think this is good to go. Can someone take a look?

ulysses4ever · 2025-04-02T14:24:27Z

I'm expediting this in the interest of unlocking progress on other PRs. This is a performance-only patch. In my fast back-of-envelop evaluation, it brings Mergesort par with one thread on par with Mergesort Seq and shows reasonable scaling with increased number of CPUs. The main content is INLIN(ABL)E pragmas and manual worker/wrapper in a couple of places.

ulysses4ever force-pushed the performance-tuning-par branch 2 times, most recently from 4ada11e to 0a8b99b Compare April 1, 2025 19:37

ulysses4ever force-pushed the performance-tuning-par branch from 0a8b99b to ea30799 Compare April 2, 2025 02:09

ulysses4ever marked this pull request as ready for review April 2, 2025 02:14

add a bunch of missing INLIN(ABL)E in the parallel-merge code

ed50bc6

ulysses4ever force-pushed the performance-tuning-par branch from ea30799 to ed50bc6 Compare April 2, 2025 02:37

ulysses4ever merged commit c66e51d into main Apr 2, 2025
5 checks passed

ulysses4ever deleted the performance-tuning-par branch April 2, 2025 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add missing INLIN(ABL)E in the parallel-merge code #21

Add missing INLIN(ABL)E in the parallel-merge code #21

Uh oh!

ulysses4ever commented Mar 26, 2025

Uh oh!

ulysses4ever commented Mar 26, 2025

Uh oh!

ulysses4ever commented Apr 1, 2025

Uh oh!

ulysses4ever commented Apr 1, 2025

Uh oh!

ulysses4ever commented Apr 2, 2025

Uh oh!

ulysses4ever commented Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add missing INLIN(ABL)E in the parallel-merge code #21

Add missing INLIN(ABL)E in the parallel-merge code #21

Uh oh!

Conversation

ulysses4ever commented Mar 26, 2025

Uh oh!

ulysses4ever commented Mar 26, 2025

Uh oh!

ulysses4ever commented Apr 1, 2025

Uh oh!

ulysses4ever commented Apr 1, 2025

Uh oh!

ulysses4ever commented Apr 2, 2025

Uh oh!

ulysses4ever commented Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants